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Abstract 

Previous  work  has  shown  that  there  are  two  major  com¬ 
plexity  barriers  in  the  synthesis  of  fault-tolerant  dis¬ 
tributed  programs,  namely  generation  of  fault-span,  the 
set  of  states  reachable  in  the  presence  of  faults,  and,  re¬ 
solving  deadlock  states,  where  the  program  has  no  out¬ 
going  transitions.  Although  symbolic  techniques  can  im¬ 
prove  the  performance  of  synthesis  algorithms  by  or¬ 
ders  of  magnitude,  efficient  heuristics  arc  still  needed  to 
overcome  the  aforementioned  obstacles.  Thus,  motivated 
by  the  idea  of  partitioning  the  transition  relation  of  dis¬ 
tributed  programs  across  multiple  threads,  in  this  paper, 
we  introduce  an  efficient  parallel  (shared  memory)  algo¬ 
rithm  for  resolving  deadlock  states  in  symbolic  synthesis 
of  distributed  programs.  In  spite  of  notorious  resistance 
of  symbolic  algorithms  for  parallelization,  experimental 
results  show  that  our  parallel  algorithm  exhibits  super-lin¬ 
ear  performance  improvement. 

Keywords:  Program  transformation,  Program  syn¬ 
thesis,  Parallel  algorithm,  Multi-core,  Distributed 
programs,  Deadlock  resolution,  Fault-tolerance. 

1  Introduction 

Automatically  deriving  programs  that  arc  correct-by- 
construction  has  been  one  of  the  most  ambitious  goals 
in  computer  science  for  several  decades.  Such  auto¬ 
matic  construction  of  programs  is  especially  useful  in  de¬ 
pendable  mission/safety-critical  systems  where  correct¬ 
ness  plays  a  crucial  role.  One  way  to  achieve  this  goal 
is  to  use  program  synthesis  techniques.  Program  synthe¬ 
sis  is  especially  beneficial  in  program  maintenance  where 
system  requirements  constantly  evolve  and,  thus,  pro¬ 
grams  need  to  be  revised.  In  the  context  of  distributed 
systems,  program  synthesis  is  desirable  when  an  exist¬ 
ing  program  is  subject  to  uncontrollable  faults.  Indeed, 
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since  it  may  be  virtually  impossible  to  anticipate  all  faults 
that  a  distributed  program  may  be  subject  to  at  design 
time,  it  is  highly  advantageous  for  designers  of  fault- 
tolerant  systems  to  have  access  to  synthesis  methods  that 
incrementally  add  fault-tolerance  to  a  given  distributed 
fault-intolerant  program.  Intuitively,  by  a  fault-tolerant 
program,  we  mean  a  program  that  meets  its  safety  and 
liveness  requirements  in  both  absence  and  presence  of 
faults.  And,  the  corresponding  synthesis  problem  fo¬ 
cuses  on  analyzing  the  existing  fault-intolerant  program 
to  add/remove  transitions/actions  so  that  the  revised  pro¬ 
gram  is  fault-tolerant.  Note  that  by  its  nature,  such  syn¬ 
thesis  algorithms  arc  offline  because  they  focus  on  trans¬ 
forming  one  program  into  another. 

One  crucial  problem  in  program  synthesis  is  the  time 
and  space  complexity.  To  manage  these  complexities,  in 
our  previous  work  [1,2],  we  proposed  a  set  of  enumer- 
ative  and  symbolic  (BDD-based)  techniques  for  adding 
fault-tolerance  to  existing  distributed  fault-intolerant  pro¬ 
grams.  In  order  to  synthesize  a  fault-tolerant  program, 
the  algorithms  in  [1,2]  repeat  a  sequence  of  steps  such  as 
(1)  generation  of  fault-span  (the  set  of  states  reachable  by 
program  and  fault  transitions),  (2)  identifying  and  remov¬ 
ing  unsafe  transitions,  (3)  resolving  deadlock  states,  and 
(4)  reconstructing  invariant  predicate,  until  a  fixedpoint 
is  reached.  We  also  showed  that  symbolic  techniques  [2] 
improve  the  performance  of  synthesis  by  several  orders 
of  magnitude,  paving  the  path  for  synthesizing  moderate¬ 
sized  programs  with  state  space  of  size  1030  and  be¬ 
yond.  Based  on  the  analysis  of  the  experimental  results 
from  [2] ,  we  observed  that  depending  upon  the  structure 
of  the  given  distributed  intolerant  program,  performance 
of  synthesis  suffers  from  two  major  complexity  obsta¬ 
cles,  namely  generation  of  fault-span  and  resolution  of 
deadlock  states.  Thus,  more  efficient  techniques  are  still 
needed  to  overcome  the  aforementioned  bottlenecks.  In 
this  paper,  we  focus  on  the  second  problem,  i.e.,  resolu¬ 
tion  of  deadlock  states.  Deadlock  resolution  is  especially 
crucial  in  the  context  of  dependable  systems,  as  it  guar- 


1 


Report  Documentation  Page 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 


1.  REPORT  DATE 

2008 


2.  REPORT  TYPE 


4.  TITLE  AND  SUBTITLE 

Parallelizing  Deadlock  Resolution  in  Symbolic  Synthesis  of  Distributed 
Programs 

6.  AUTHOR(S) 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Michigan  State  University, Department  of  Computer  Science  and 
Engineering, East  Lansing, MI, 48824 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 


3.  DATES  COVERED 

00-00-2008  to  00-00-2008 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

5d.  PROIECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 


12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

Previous  work  has  shown  that  there  are  two  major  complexity  barriers  in  the  synthesis  of  fault-tolerant 
distributed  programs,  namely  generation  of  fault-span,  the  set  of  states  reachable  in  the  presence  of  faults, 
and,  resolving  deadlock  states,  where  the  program  has  no  outgoing  transitions.  Although  symbolic 
techniques  can  improve  the  performance  of  synthesis  algorithms  by  orders  of  magnitude,  efficient 
heuristics  are  still  needed  to  overcome  the  aforementioned  obstacles.  Thus,  motivated  by  the  idea  of 
partitioning  the  transition  relation  of  distributed  programs  across  multiple  threads,  in  this  paper,  we 
introduce  an  efficient  parallel  (shared  memory)  algorithm  for  resolving  deadlock  states  in  symbolic 
synthesis  of  distributed  programs.  In  spite  of  notorious  resistance  of  symbolic  algorithms  for 
parallelization,  experimental  results  show  that  our  parallel  algorithm  exhibits  superlinear  performance 
improvement. 

15.  SUBIECT  TERMS 


16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 

18.  NUMBER 

19a.  NAME  OF 

ABSTRACT 

OF  PAGES 

RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Same  as 
Report  (SAR) 

12 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


antees  that  the  synthesized  fault-tolerant  program  meets 
its  liveness  requirements  even  in  the  presence  of  faults. 

1.1  The  Deadlock  Resolution  Problem 

We  now  describe  the  issue  of  deadlock  resolution  using 
the  Byzantine  agreement  (denoted  BA)  problem  [3] .  We 
omit  other  steps  involved  in  synthesizing  a  fault-tolerant 
version  of  BA  (e.g.,  fault-span  generation,  preserving 
safety,  and  reconstructing  invariant  predicate),  as  they  arc 
not  in  the  scope  of  this  paper.  BA  consists  of  a  general, 
say  g,  and  three  (or  more)  non-general  processes:  j,  k, 
and  l.  Each  process  of  BA  maintains  a  decision  d:  for  the 
general,  the  decision  can  be  either  0  or  1,  and  for  the  non¬ 
general  processes,  the  decision  can  be  0,  1,  or  _L,  where 
the  value  _  denotes  that  the  corresponding  process  has 
not  yet  received  the  decision  from  the  general.  Each  non¬ 
general  process  also  maintains  a  Boolean  variable  /  that 
denotes  whether  that  process  has  finalized  its  decision. 
For  each  process,  a  Boolean  variable  b  shows  whether  or 
not  the  process  is  Byzantine.  In  the  fault-intolerant  ver¬ 
sion  of  this  program,  each  non-general  process  copies  the 
decision  from  the  general  and  then  finalizes  (outputs)  that 
decision,  provided  it  is  non-Byzantine.  A  fault  transition 
can  cause  a  process  to  become  Byzantine,  if  no  other  pro¬ 
cess  is  initially  Byzantine.  Also,  a  fault  can  change  the 
d  and  /  values  of  a  Byzantine  process.  Let  the  sequence 
(xi,  X2,  x%,  xf)  denote  the  set  of  states  with  respect  to 
decision  value  of  processes,  i.e.,  xi  =  d.g,  X2  =  d.j, 
X3  =  d.k,  and  X4  =  d.l.  In  this  notation,  an  overlined 
(respectively,  underlined)  d- value  shows  that  the  corre¬ 
sponding  process  has  finalized  its  decision  (respectively, 
is  Byzantine).  Now  consider  the  following  scenarios: 

•  Starting  from  a  state  so  in  (1,  _L,  _L,  1),  where  the 
general  and  process  l  agree  on  decision  1  and  pro¬ 
cesses  j  and  k  arc  undecided,  the  program  may 
reach  the  following  sequence  of  states  due  to  oc¬ 
currence  of  faults  (denoted  — -»)  and  execution  of 
program  actions  (denoted  — >):  (1,_L,_L,  1)  — ■* 

U,-U-L,l>  — >  (Q,±,±,l>  -  (Q,0,  ±,1)  -V 
(0,  0,  0, 1).  Let  si  be  a  state  in  (0,  0,  0, 1),  where 
the  Byzantine  general  g  and  non-general  processes  j 
and  k  agree  on  decision  0,  but  process  l  has  decided 
on  1 .  Now,  consider  the  tasks  for  a  synthesis  algo¬ 
rithm  in  dealing  with  state  si.  Note  that  no  process 
can  determine  whether  other  processes  have  final¬ 
ized  their  decision  due  to  the  issue  of  distribution. 
Thus,  the  synthesis  algorithm  rules  out  transitions 
that  originate  from  .sq  and  j  finalizes  its  decision,  as 
it  would  violate  safety  (i.e.,  agreement).  Likewise, 
it  cannot  allow  k  and  l  to  finalize  either.  We  call 
states  such  as  si  a  deadlock  state,  since  the  program 
cannot  proceed  its  execution.  A  synthesis  algorithm 
can  resolve  this  deadlock  state  by  simply  adding  a 


recovery  transition  that  changes  the  decision  of  l  to 
0  which  results  in  reaching  a  legitimate  state  with¬ 
out  violating  safety.  After  adding  such  transitions, 
in  the  next  iteration  of  the  synthesis  algorithm,  we 
can  allow  j  and  k  to  finalize  their  decision  after  con¬ 
cluding  that  (0,  0,  0, 1)  (i.e.,  where  l  is  not  Byzan¬ 
tine  and  has  finalized)  is  not  reached. 

•  Now,  consider  the  scenario  where  so  reaches  the 
following  sequence  of  states:  (1,  _L,  _L,  1)  — > 

(1  ,-L,-Lj)  -4  (1,  _L,  _L,  T)  -4  (0,  _L,_L,T)  - 
(0, 0,  _L,  1)  — ►  (0, 0,0, 1).  Let  S2  be  a  state  in 
(0,0,  0,1),  where  non-general  processes  j  and  k 
agree  with  the  Byzantine  general  on  decision  0,  but 
process  l  has  finalized  its  decision  on  1.  Obviously, 
S2  is  also  a  deadlock  state.  However,  unlike  .sq  in 
the  previous  scenario,  since  process  l  has  finalized 
its  decision,  we  cannot  resolve  sq  by  adding  safe 
recovery.  One  approach  to  deal  with  such  deadlock 
states  is  to  simply  eliminate  them  (i.e.,  making  them 
unreachable).  However,  since  we  require  that  during 
elimination  of  a  deadlock  state,  no  new  deadlock 
states  must  be  created,  a  respective  deadlock  reso¬ 
lution  algorithm  involves  many  backtracking  steps. 
In  particular,  in  order  to  resolve  .sq,  the  algorithm 
needs  to  explore  the  reachability  graph  and  remove 
the  transition  that  allows  a  process  to  finalize  its  de¬ 
cision  while  there  exist  two  undecided  processes. 

In  [2],  we  observed  that  in  order  to  automatically  synthe¬ 
size  a  fault-tolerant  version  of  BA  identical  to  the  one  by 
Lamport,  Shostak,  and  Pease  [3],  92%  of  the  total  syn¬ 
thesis  time  is  spent  to  resolve  deadlock  states. 

1.2  Contributions 

With  this  motivation,  in  this  paper,  we  introduce  a  par¬ 
allel  BDD-based  algorithm  for  resolving  deadlock  states 
in  distributed  programs  that  are  subject  to  a  set  of  faults. 
We  specifically  design  our  algorithm  for  multiprocessor 
architectures  with  shared  memory  (e.g.,  multi-core  pro¬ 
cessors)  due  to  their  availability  in  virtually  any  organi¬ 
zation.  Intuitively,  our  algorithm  partitions  the  transition 
relation  of  the  given  intolerant  program  across  multiple 
threads  where  each  thread  works  on  a  different  proces¬ 
sor  core.  The  algorithm  makes  no  assumptions  about 
the  structure  of  a  given  program  (e.g.,  set  of  transitions, 
number  of  distributed  processes,  or  its  reachable  states) 
in  order  to  resolve  deadlock  states.  Thus,  we  expect  the 
algorithm  to  be  generally  applicable  to  a  wide  variety  of 
distributed  programs.  Our  parallel  algorithm  tends  to  re¬ 
quire  more  memory  than  its  sequential  version.  However, 
based  on  our  experimental  results,  unlike  model  check¬ 
ing,  BDD-based  synthesis  algorithms  run  out  of  time  be¬ 
fore  they  run  out  of  memory.  Hence,  the  increased  space 
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complexity  is  unlikely  to  be  a  bottleneck  during  synthe¬ 
sis. 

We  note  that  symbolic  algorithms  arc  known  to  be  no¬ 
toriously  hai'd  to  parallelize  due  to  the  interdependence 
among  data  structures  involved  in  such  algorithms.  As 
a  matter  of  fact,  while  parallel  implementations  of  sym¬ 
bolic  model  checkers  are  often  successful  in  increasing 
available  memory,  the  speedup  gained  from  such  tech¬ 
niques  is  limited.  This  is  largely  due  to  the  irregular 
nature  of  the  state-space  generation  task  and  the  result¬ 
ing  high  parallel  overheads  such  as  load  imbalance  and 
scheduling  of  small  computations.  Although  some  re¬ 
sults  in  the  literature  (e.g.,  [4])  have  concluded  that  par¬ 
allelization  of  symbolic  algorithms  involves  too  many  in¬ 
terrelated  factors  which  leads  to  inefficiency  in  terms  of 
speedups,  we  argue  that  parallelization  based  on  parti¬ 
tioning  the  transition  relation  is  remarkably  efficient,  as 
it  can  potentially  minimize  the  interdependence  among 
data  structures  such  as  BDDs.  In  fact,  our  experiments 
show  that  our  parallel  algorithm  exhibits  superlinear 
speedup  as  compared  to  the  sequential  algorithm. 
Organization.  The  rest  of  the  paper  is  organized  as 
follows.  In  Sections  2  and  3,  we  present  precise  defini¬ 
tions  for  distributed  programs,  specifications,  and  fault- 
tolerance.  We  formally  state  the  problem  of  synthesiz¬ 
ing  fault-tolerant  programs  in  Section  4.  Section  5  is 
dedicated  to  describe  our  parallel  symbolic  algorithm  for 
deadlock  resolution.  Subsequently,  experimental  results 
and  analysis  are  presented  in  Section  6.  Related  work  is 
discussed  in  Section  7.  Finally,  we  conclude  in  Section 
8. 

2  Distributed  Programs  and  Specifica¬ 
tions 

Let  V  =  {no,  v\  ■  ■  ■  vn]  be  a  finite  set  of  Boolean  vari¬ 
ables.  A  state  is  determined  by  the  function  s  :  V  >—? 
{true,  false},  which  maps  each  variable  in  V  to  either 
true  or  false.  Thus,  we  represent  a  state  s  by  the  con¬ 
junction  s  =  Aj=o  l(vj)  where  Vj  G  V  for  all  j,  and 
l  ( Vj )  denotes  a  literal,  which  is  either  vj  itself  or  its  nega¬ 
tion  -i Vj.  Since  non-Boolean  variables  with  finite  domain 
D  can  be  represented  by  log(|D|)  Boolean  variables,  our 
notion  of  state  is  not  restricted  to  Boolean  variables. 

Definition  2.1  (state  predicate)  A  state  predicate  is  a 
finite  set  of  states.  Formally,  we  specify  a  state  predicate 
S  =  {s0,  si  •  •  •  sm,}  by  the  disjunction  S  =  V^o (SA 1 

Observe  that  although  the  formula  defined  in  Defini¬ 
tion  2.1  is  in  disjunctive  normal  form,  one  can  represent 
a  state  predicate  by  any  equivalent  Boolean  expression. 
We  denote  the  membership  of  a  state  s  in  a  state  predi¬ 
cate  S  by  s  |=  S. 

A  transition  is  a  pair  of  states  of  the  form  (s,  s')  spec¬ 
ified  as  a  Boolean  formula  as  follows.  Let  V'  be  the  set 


{v'  |  v  G  V}  (called  primed  variables).  Primed  variables 
are  meant  to  show  the  new  value  of  variables  prescribed 
by  a  transition.  Thus,  we  define  a  transition  (s,  s')  by 
the  conjunction  s  A  s'  where  s'  =  Aj=o  Kvj )  such  that 
v'j  G  V  for  all  j. 

Definition  2.2  (transition  predicate)  A  tran¬ 

sition  predicate  P  is  a  finite  set  of  transitions 
{(so,4)>(si;sr)---(sm>sm)}  formally  defined  by 
p  =  v;=o(s  i  A  s'f).  We  denote  the  membership 
of  a  transition  (s,  s')  in  a  transition  predicate  P  by 
(s,  s')  |=  P.  I 

Notation.  Let  X  be  a  state  predicate.  We  use  ( X )'  to 
denote  the  state  predicate  obtained  by  replacing  all  vari¬ 
ables  that  participate  in  X  by  their  corresponding  primed 
variables.  Also,  let  P  be  a  transition  predicate.  We  use 
Guard(P)  to  denote  the  source  state  predicate  of  P  (i.e., 
s  A  Guard(P)  iff  3s'  ::  (s,  s')  \=  P ).  I 

Definition  2.3  (closure)  Let  P  be  a  transition  predicate 
and  S’  be  a  state  predicate.  We  say  that  a  state  predicate 
S  is  closed  in  P  iff  A(s,.s').|=p((s  l=  &)  (s>  l=  (S)')) 

holds.  I 

Definition  2.4  (process)  A  process  j  is  specified  by  the 
tuple  ( Vj,Pj ,  Rj,  Wj)  where  Vj  is  a  set  of  variables,  Pj 
is  a  transition  predicate  in  the  set  of  all  possible  states 
obtained  from  Vj  (called  state  space),  Rj  is  a  set  of  vari¬ 
ables  that  j  can  read,  and  Wj  is  a  set  of  variables  that  j 
can  write  such  that  Wj  C  Rj  C  Vj  (i.e.,  we  assume  that 
j  cannot  blindly  write  a  variable).  I 
Write  restrictions.  Let  (Vj,  Pj,  Rj,Wj)  be  a  pro¬ 
cess  and  v(s)  denote  the  value  of  a  variable  v  in  state  s. 
Clearly,  Pj  must  be  disjoint  from  the  following  transition 
predicate:  NW j  =  V(s,s')  \lvfWj(v(s)  +  A5'))- 
Read  restrictions.  Let  (Vj,  Pj,Rj,  Wj)  be  a  process, 
v  be  a  variable  in  Vj,  and  (so,  s'0)  j=  Pj  where  so  A  s'0. 
If  v  is  not  in  Rj,  then  j  must  include  a  corresponding 
transition  from  all  states  sq  where  si  and  so  differ  only  in 
the  value  of  v.  Let  (si,  A)  be  one  such  transition.  Now, 
it  must  be  the  case  that  s'()  and  s\  are  identical  except  for 
the  value  of  v.  And,  value  of  v  must  be  the  same  in  si  and 
A .  For  instance,  let  Vj  =  {a,  b}  and  Rj  =  {a}.  Thus, 
since  j  cannot  read  b,  the  transition  ->a  A  -<b  A  a'  A  ~<b' 
and  the  transition  ->a  A  b  A  a'  A  //  have  the  same  effect  as 
far-  as  j  is  concerned.  Thus,  each  transition  (so,  A)  in  Pj 
is  associated  with  the  following  group  predicate'. 

Group j  (s0,  so)  =  V(Sl,si) 

(AviRj(v(so)  =  v(s'o )  A  An)  =  AA))  A 

A„erq(ASo)  =An)  a  Aso)  =  AA))) 

Definition  2.5  (program)  A  program  P  is  specified  by 
a  set  Pr  of  processes.  We  require  that  the  state  space  of 
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all  processes  must  be  identical  (i.e.,  Vi,  j  G  Pr  ::  Vr  = 
Vj).  Thus,  the  state  space  of  P  is  identical  to  the  state 
space  of  its  processes  as  well.  For  simplicity,  we  refer  to 
a  program  P  by  the  disjunction  of  its  processes’  transi¬ 
tion  predicates,  i.e.,  P  =  \Zjepr(Pj)-  ■ 

To  concisely  write  the  transitions  in  a  process,  we 
use  guarded  commands  (also  called  actions).  A  guarded 
command  is  of  the  form  L  ::  g  — >  st,  where  L  is  a  label, 
g  is  a  state  predicate  (called  guard),  and  st  is  a  state¬ 
ment  that  describes  how  the  program  state  is  updated. 
Thus,  an  action  g  — >  st  denotes  the  transition  predi¬ 
cate  {(s,  s')  \  s  =>  g  and  s'  is  obtained  by  changing  s  as 
prescribed  by  st}. 


Example  (Byzantine  agreement).  Following  the  de¬ 
scription  of  the  Byzantine  agreement  program  (denoted 
BA)  in  the  introduction,  BA  consists  of  a  general  pro¬ 
cess  g  and  three  non-general  processes  j,  k,  and  l.  The 
state  space  of  each  process  is  obtained  by  variables  in 
V  =  { d.g ,  d.j,  d.k ,  d.l}  U  (decision  variables) 

{f -3,  f-kj.l}  U  (finalized?) 

{b.g,b.j,b.k,b.l}.  (Byzantine?) 

The  transition  predicate  of  a  non-general  process,  say  j, 
is  specified  by  the  following  two  actions: 

BAlj  ::  {d.j  =  _L)  A  {f.j=  false)  — >  d.j  :=  d.g 
BA2j  ::  (d.j  ±  _L)  A  (f.j  =  false)  — >  f.j:=true 

Since  the  general  process  only  provides  a  decision,  its 
transition  predicate  is  empty.  The  sets  of  variables  that 
a  non-general  processes,  say  j,  is  allowed  to  read  and 
write  are  Rj  =  {b.j,  d.j,  f.j,  d.k,  d.l,  d.g}  and  Wj  = 
{d.j,  f.j},  respectively. 


Definition  2.6  (computation)  A  sequence  of  states, 
c  =  (so,  si  •  •  • ),  is  a  computation  of  program  P  iff  the 
following  two  conditions  arc  satisfied:  (1)  Vz  >  0  : 
(sj,  Sj+i)  |=  P,  and  (2)  if  c  is  finite  and  terminates  in  state 
si  then  there  does  not  exist  state  s  such  that  (s/,  s)  \=P.  I 
We  distinguish  between  a  terminating  computation 
and  a  deadlocked  computation.  Precisely,  when  a  com¬ 
putation  c  terminates  in  state  s;,  we  include  the  transition 
(sj,  s()  in  P,  i.e.,  c  can  be  extended  to  an  infinite  com¬ 
putation  by  stuttering  at  s;.  On  the  other  hand,  if  there 
exists  a  state  Sd  such  that  there  is  no  outgoing  transition 
(or  a  self-loop)  from  s,i  then  s,/  is  a  deadlock  state. 

Definition  2.7  (deadlock  state)  We  say  that  a  state  s  in 
program  P  is  a  deadlock  state  iff  for  all  states  s'  in  the 
state  space  of  P,  ( s ,  s')  \f=  P.  I 

2.1  Specification  and  Invariant 

A  specification  SPEC  is  a  set  of  infinite  sequences  of 
states.  We  now  define  what  it  means  for  a  program  to 
satisfy  a  specification.  We  note  that  throughout  the  paper, 
we  assume  that  state  space  of  a  program  and  its  specifi¬ 
cation  arc  identical. 


Definition  2.8  (satisfies)  Let  P  be  a  program.  S'  be  a 
state  predicate,  and  SPEC  be  a  specification.  We  say 
that  P  satisfies  SPEC  from  S  iff  (1)  S  is  closed  in  P, 
and  (2)  for  all  computations  c  =  (so,  si  •  •  ■ )  of  P,  where 
so  |=  S,  c  is  in  SPEC.  I 

Definition  2.9  (invariant)  Let  P  be  a  program,  SPEC 
be  a  specification,  and  S  be  a  state  predicate  where  S 
false.  We  say  that  S  is  an  invariant  predicate  of  P  for 
SPEC  iff  P  satisfies  SPEC  from  S.  I 

Observe  that  the  notion  of  satisfies  characterizes  the 
property  of  infinite  sequences  with  respect  to  a  program. 
In  order  to  characterize  finite  sequences,  we  introduce  the 
notion  of  maintains. 

Definition  2.10  (maintains)  Let  SPEC  be  a  specifi¬ 
cation,  P  be  a  program,  and  S  be  a  state  predicate.  We 
say  that  program  P  maintains  SPEC  from  S  iff  (1)  S  is 
closed  in  P,  and  (2)  for  all  computation  prefixes  a  of  P 
that  starts  from  S,  there  exists  a  sequence  of  states  (3  such 
that  a/3  is  in  SPEC .  Otherwise,  we  say  that  P  violates 
SPEC.  I 

We  let  the  specification  consist  of  a  safety  specifica¬ 
tion  and  a  liveness  specification.  Following  Alpern  and 
Schneider  [5],  safety  specification  can  be  characterized 
by  a  set  of  bad  prefixes  that  should  not  occur  in  any  com¬ 
putation.  Throughout  this  paper,  we  let  the  length  of  such 
bad  prefixes  be  two,  i.e.,  a  set  of  bad  transitions  denoted 
by  transition  predicate  SPECu-  Thus,  the  safety  specifi¬ 
cation  can  be  formally  defined  by  the  set  SPEC ^  of  in¬ 
finite  sequences,  such  that  no  infinite  sequence  contains 
a  transition  in  SPEC u- 

A  liveness  specification  of  SPEC  is  a  set  of  infinite 
sequences  of  states  that  meets  the  following  condition: 
for  each  finite  sequence  of  states  a  there  exists  a  suffix 
/ 3  such  that  a (3  G  SPEC .  In  our  synthesis  problem  (cf. 
Section  4),  we  begin  with  an  initial  program  that  satisfies 
its  specification  (including  the  liveness  specification).  As 
mentioned  earlier,  the  focus  of  this  paper  is  on  develop¬ 
ing  a  parallel  algorithm  that  resolves  reachable  deadlock 
states  of  a  program  in  the  presence  of  faults.  Clearly, 
such  deadlock  resolution  is  crucial  in  order  to  ensure  that 
any  finite  computation  of  the  synthesized  program  can  be 
extended  to  an  infinite  computation  that  is  in  SPEC .  In 
other  words,  our  synthesis  method  preserves  the  liveness 
specification.  Hence,  the  liveness  specification  need  not 
be  specified  explicitly. 

Notation.  Whenever  the  specification  is  clear  from  the 
context,  we  will  omit  it;  thus,  “S  is  an  invariant  of  P”  ab¬ 
breviates  “S  is  an  invariant  predicate  of  P  for  SPEC”.  I 


Example  (cont’d).  The  safety  specification  of  BA  re¬ 
quires  validity  and  agreement.  Validity  requires  that  if 
the  general  is  non-Byzantine  then  the  final  decision  of  a 
non-Byzantine  process  must  be  the  same  as  that  of  the 
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general.  And,  agreement  requires  that  the  final  decision 
of  any  two  non-Byzantine  processes  must  be  equal.  Fi¬ 
nally,  once  a  non-Byzantine  process  finalizes  (outputs)  its 
decision,  it  cannot  change  it.  Thus,  the  following  transi¬ 
tion  predicate  forms  the  safety  specification,  where  p  and 
q  range  over  non-general  processes: 

SPECbtBA  = 

(3p  ::  b'.g  A  —b'.p  A  (d'.p  7^  _L)  A  f'.p  A  (d'.p  f  d'.g))  V 
(3p,  q  ::  -ff  .p  A  -.ft'.g  A  f'.p  A  f'.q  A  (d'.p  f  _L)A 
(d'.q  f  ±)  A  (d'.p  f  d'.q ))  V 
(3p  ::  b.p  A  -F.p  A  f.p  A  ((d.p  7^  d'.p)  V  (f.p  f  .f'.p))) 

The  invariant  predicate  of  the  Byzantine  agreement 
program  consists  of  the  following  states.  First,  we  con¬ 
sider  the  set  of  states  where  the  general  is  non-Byzantine. 
In  this  case,  one  of  the  non-general  processes  may  be 
Byzantine.  However,  if  a  non-general  process,  say  j,  is 
non-Byzantine,  it  is  necessary  that  d.j  be  initialized  to 
either  _L  or  d.g.  Also,  a  non-Byzantine  process  cannot  fi¬ 
nalize  its  decision  if  its  decision  equals  _L  Moreover,  we 
consider  the  set  of  states  where  the  general  is  Byzantine. 
In  this  case,  g  can  change  d.g  value  arbitrarily.  It  follows 
that  if  other  processes  arc  non-Byzantine  and  d.j,  d.k  and 
d.l  arc  initialized  to  the  same  value  that  is  different  from 
_L,  the  program  satisfies  SPECbtBA •  Thus,  the  invariant 
predicate  is  as  follows: 

Sba  = 

—1 b.g  A  (-1 b.j  V  -1  b.k)  A  (-1  b.k  V  -<b.l)  A  (~<b.l  V  ~<b.j)  A 

(Vp  ::  -1 b.p  =>■  (d.p  =  1V  d.p  =  d.g))  A 

(Vp  ::  (~<b.p  A  f.p)  =>  (d.p  f  _!_))  V 

b.g  A  -<b.j  A  -ib.k  A  -1  b.l  A  (d.j  =  d.k  =  d.l  A  d.j  f  _L) 

An  alert  reader  can  easily  verify  that  BA  satisfies 
SPECbtBA  from  SRA. 


3  Fault  Model  and  Fault-Tolerance 

Following  Arora  and  Gouda  [6],  the  faults  that  a  program 
P  is  subject  to  arc  systematically  represented  by  a  tran¬ 
sition  predicate  F  in  the  state  space  of  P. 


Example  (cont’d).  The  fault  transitions  that  affect  a 
process,  say  j ,  of  BA  arc  as  follows:  (We  include  similar 
actions  for  k,  l,  and  g) 

FI  ::  -1  b.g  A  -1  b.j  A  ~^b.k  A  -1  b.l  — >  b.j  :=  true 
FI  ::  b.j  — >  d.j,  f.j  :=  0\l,false\true 

where  d.j  :=  0 1 1  means  that  d.j  could  be  assigned  either 
0  or  1.  In  case  of  the  general  process,  the  second  action 
does  not  change  the  value  of  any  /-variable. 


Definition  3.1  (fault-span)  Given  a  program  P,  faults 
F,  and  invariant  S,  we  say  that  a  state  predicate  T  is  an 
F-span  (read  as  fault-span)  of  P  from  S  iff  the  following 
two  conditions  are  satisfied:  (1)  S  =>•  T,  and  (2)  T  is 
closed  in  P  V  F.  I 

Just  as  we  defined  the  computation  of  P,  we  say  that 
a  sequence  of  states,  (so,  si  •  •  • ),  is  a  computation  of  P 
in  the  presence  of  F  iff  the  following  three  conditions 
are  satisfied:  (1)  Vj  >  0  ::  (sj-±,Sj)  | =(PV  F),  (2)  if 
(so,  si  •  •  • )  is  finite  and  terminates  in  state  67  then  there 
does  not  exist  state  s  such  that  (si,  s)  | =P,  and  (3)  3n  > 
0  ::  (Vj  >  n  ::  (sj-i,Sj)\=P). 

Definition  3.2  (fault-tolerance)  Let  P  be  a  program 
with  invariant  S,  F  be  a  set  of  faults,  and  SPEC  be  a 
specification.  We  say  that  P  is  F -tolerant  (read  as  fault- 
tolerant)  to  SPEC  from  S  iff  the  following  two  condi¬ 
tions  hold:  (1)  P  satisfies  SPEC  from  S,  and  (2)  there 
exists  T  such  that  (i)  T  is  an  F-span  of  P  from  S,  (ii) 
P  V  F  maintains  SPEC  from  T,  and  (iii)  every  compu¬ 
tation  of  P  V  F  that  starts  from  a  state  in  T  has  a  state  in 
S.  I 

4  The  Synthesis  Problem 

Given  arc  a  program  P  with  invariant  S,  a  class  of  faults 
F,  and  specification  SPEC  such  that  P  satisfies  SPEC 
from  S.  Our  goal  is  to  find  a  program  P'  with  invariant 
S'  such  that  P'  is  F-tolerant  to  SPEC  from  S'.  In  or¬ 
der  to  capture  the  requirement  that  our  synthesis  method 
only  adds  fault-tolerance  and  does  not  add  new  behav¬ 
iors  in  the  absence  of  faults,  we  introduce  the  notion  of 
projection. 

Definition  4.1  (projection)  The  projection  of  program 
P  on  state  predicate  S,  denoted  as  P\S,  is  the  program 
(i.e.,  transition  predicate)  V(ss')i=p  ((«  M)a  (*'  h 
(5)')).  I.e.,  P\S  consists  of  transitions  of  P  that  start  in 

5  and  end  in  S.  I 
Now,  observe  that: 

1.  If  S'  contains  states  that  are  not  in  S  then,  in  the 
absence  of  faults,  P'  may  include  computations  that 
start  outside  S.  Since  we  require  that  P'  satisfies 
SPEC  from  S',  it  implies  that  P'  is  using  a  new 
way  to  satisfy  SPEC  in  the  absence  of  faults.  Thus, 
we  require  that  S'  =>  S. 

2.  If  P'\S'  contains  a  transition  that  is  not  in  Fj.S" 
then  P'  can  use  this  transition  in  order  to  satisfy 
SPEC  in  the  absence  of  faults.  Thus,  we  require 
that  (P'\Sr)  =>  (P\S'). 

Following  the  above  observations,  the  synthesis  problem 
is  as  follows. 

Problem  statement.  Given  P,  S,  F,  and  SPEC  such 
that  P  satisfies  SPEC  from  S.  Identify  P'  and  S'  such 
that: 
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(Cl)  s'  =>  s, 

(C2)  (P'|5,)=>(P|5'),aiid 

(C3)  P’  is  F-tolerant  to  .SPEC  from  S'.  I 

Notice  that  the  third  condition  of  the  synthesis  prob¬ 
lem  implies  that  every  computation  of  P'  that  stalls  from 
a  state  in  the  fault-span  of  P',  say  T',  has  to  be  infinite 
(cf.  Definition  3.2).  Hence,  T'  cannot  include  any  dead¬ 
lock  states.  In  the  next  section,  we  introduce  our  parallel 
algorithm  for  resolving  deadlock  states  reachable  from  S 
using  transitions  in  P  V  F.  This  algorithm  can  be  used 
as  a  building  block  of  algorithms  for  synthesizing  P'  and 
S’. 

5  Parallel  Symbolic  Resolution  of 
Deadlock  States 

In  this  section,  we  present  our  parallel  BDD-based  algo¬ 
rithm  for  resolving  deadlock  states  reachable  in  the  pres¬ 
ence  of  faults  in  a  distributed  program.  A  major  barrier  in 
such  parallelization  is  that  BDD  manipulation  packages 
are  not  reentrant  due  to  data  structures  shared  across  sev¬ 
eral  BDDs  (e.g.,  a  hash  table  that  stores  all  BDD  nodes). 
There  arc  two  approaches  to  deal  with  this  obstacle.  The 
first  approach  is  to  modify  a  BDD  package  to  make  it 
reentrant  (cf.  Section  7  for  details).  The  second  ap¬ 
proach  is  to  utilize  multiple  instances  of  the  BDD  pack¬ 
age  that  do  not  share  memory.  With  this  approach,  each 
thread  works  on  its  own  copy  of  related  BDDs.  How¬ 
ever,  changes  made  by  one  thread  would  not  be  imme¬ 
diately  available  to  other  threads.  Hence,  t  hie  ads  may 
change  the  BDDs  (e.g.,  the  program  being  synthesized) 
inconsistently.  Therefore,  we  need  to  merge  the  results 
and  remove/manage  the  inconsistencies.  In  this  work, 
we  consider  the  second  approach. 

Algorithm  sketch.  Intuitively,  our  algorithm  works 
as  follows.  During  deadlock  resolution,  a  master  thread 
spawns  several  worker  threads  each  running  on  a  differ¬ 
ent  processor  core  in  parallel  with  an  instance  of  its  own 
BDD  package.  The  instance  of  the  BDD  package  as¬ 
signed  to  each  worker  thread  is  initialized  using  BDDs 
for  program  transitions,  invariant  predicate,  fault-span, 
and  fault  transitions.  The  master  thread  partitions  the  set 
of  deadlock  states  and  provides  each  worker  thread  with 
one  such  partition.  Subsequently,  worker  threads  start 
resolving  their  assigned  set  of  deadlock  states  in  paral¬ 
lel  by  either  (1)  adding  safe  recovery,  or  (2)  eliminating 
the  ones  (i.e.,  making  them  unreachable)  from  where  safe 
recovery  is  not  possible.  Upon  completion,  the  master 
thread  merges  the  results  returned  by  each  worker  thread 
and  resolves  inconsistencies. 

5.1  Parallel  Addition  of  Safe  Recovery 

Given  a  program  P,  faults  F,  fault-span  T,  invariant 
predicate  S,  safety  specification  SPECbt.,  and  partition 


predicates  prt1  . . .  prtn,  where  n  >  1  is  the  number  of 
worker  t  hie  ads  to  be  spawned,  our  goal  is  to  synthesize 
a  transition  predicate  P'  such  that  T  contains  no  dead¬ 
lock  states,  i.e.,  T  A  Guard (P')  =  false.  Before  we 
describe  our  parallel  algorithm  for  resolving  deadlock 
states  through  addition  of  recovery  actions,  notice  that 
such  a  recovery  mechanism  should  not  violate  the  safety 
specification.  Thus,  we  first  identify  the  state  predicate 
ms  (Line  2  in  Algorithm  ResolveDeadlockStates  in 
Figure  l.a)  from  where  faults  alone  can  reach  a  state 
where  Guard(F  A  SPECbt )  is  true  (i.e.,  faults  alone  can 
violate  the  safety).  Now,  let  mt  include  the  transitions 
in  SPECbt.  as  well  as  transitions  in  P  that  end  in  ms. 
Observe  that  in  order  to  ensure  safety,  P'  (including  its 
recovery  actions)  must  be  disjoint  from  mt. 

After  identifying  the  set  ds  of  deadlock  states  in  T 
(Line  4),  we  partition  ds  using  the  partition  predicates 
such  that  Vr=i {prti  A  ds)  =  ds.  To  efficiently  parti¬ 
tion  deadlock  states  between  threads,  one  needs  to  de¬ 
sign  a  method  such  that  (1)  deadlock  states  arc  evenly 
distributed  among  worker  threads,  and  (2)  states  consid¬ 
ered  by  different  threads  for  eliminating  have  a  small 
overlap  during  backtracking.  Regarding  the  first  con¬ 
straint,  we  can  partition  deadlock  states  based  on  values 
of  some  variable  and  evaluate  the  size  of  corresponding 
BDDs  by  the  number  of  minterms  that  satisfy  the  cor¬ 
responding  formula.  Regarding  the  second  constraint, 
we  expect  that  the  overhead  for  such  a  split  is  as  high 
as  it  requires  dedicated  analysis  of  program  transitions. 
Hence,  instead  of  satisfying  this  constraint,  we  add  syn¬ 
chronization  between  threads.  Thus,  we  design  partition 
predicates  based  value  of  variables.  For  example,  in  the 
case  of  Byzantine  agreement  program  with  four  worker 
threads,  we  let  prt1  =  ( d.j  =  0)  A  ( d.k  =  0),  prt2  = 
( d.j  =  0)  A  ( d.k  /  0),  prt3  =  ( d.j  /  0)  A  (d.k  =  0), 
and  prt4  =  ( d.j  /  0)  A  ( d.k  f  0).  Next,  we  assign 
each  partition  prt  t  A  ds  of  deadlock  states  to  a  worker 
thread  to  identify  safe  recovery  paths  from  prti  A  ds  to 
the  invariant  predicate  in  a  layered  fashion  (Lines  5-8  in 
Algorithm  ResolveDeadlockStates). 

Each  worker  thread  for  adding  recovery  works  as  fol¬ 
lows  (cf.  Thread  Add  Recovery  in  Figure  Lb).  Let  the 
first  layer,  lyr,  be  the  invariant  predicate  S  (Line  1).  We 
now  construct  the  recovery  transition  predicate  rt  by  (1) 
including  transitions  that  originate  from  the  given  set  of 
deadlock  states  ds  and  end  in  lyr  (Line  3),  and  (2)  ex¬ 
cluding  transitions  that  can  lead  the  program  to  a  state 
where  safety  may  be  violated  (Line  4).  We  add  the  result¬ 
ing  recovery  transition  predicate  to  rec  (Line  5).  Now,  for 
the  next  iteration,  we  let  lyr  be  the  state  predicate  from 
where  one-step  safe  recovery  is  possible  (Line  6).  We 
continue  adding  recovery  transition  predicates  until  no 
such  transition  predicate  is  added.  Notice  that  our  strat¬ 
egy  on  adding  recovery  paths  guarantees  that  no  cycles 
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Algorithm  1  ResolveDeadlockStates 

Input:  program  P,  faults  F,  invariant  S,  fault  span  T,  safety  speci¬ 
fication  SPECbt,  and  partition  predicates  prti..prtn,  where  n  is 
the  number  of  worker  threads. 

Output:  program  P'  and  the  predicate  fte  of  states  failed  to  elimi¬ 
nate. 

1:  Let  rfo  be  the  state  predicate  reachable  by  faults  only  from  the 
invariant  predicate; 

2:  Let  ms  be  the  state  predicate  from  where  faults  alone  can  reach 
a  state  where  Guard  (F  A  SPECbt)  is  true. 

3:  mt  \=  SPECbt  V  (ms)'; 

4:  ds  :=  T  A  — ‘Guard (P); 

//  Resolving  deadlock  states  by  adding  safe  recovery 

5:  for  i  :=  1  to  n  do 

6:  rti  :=  SpawnThread  -w  AddRecovery(ds  Aprti:  S,  mt ); 

7:  end  for 
8:  Thread Join(l..n); 

9:  P:=PVV?=i  rti; 

10:  vds ,  fte  :=  false] 

11:  ds  :=  T  A  -> Guard (P); 

//  Eliminating  deadlock  states  from  where  safe  recovery  is  not 
possible 

12:  for  i  :=  1  to  n  do 

13:  rpi,vdst,ftei  :=  SpawnThread  -w  Eliminate(ds  A 

prti ,  P,  S ,  F,  T ,  vds ,  rfo,  fte ) ; 

14:  end  for 

15:  ThreadJoin(l..n); 

//  Merging  results  from  worker  threads 
16:  P’  :=  Group (/\”=1  rpj; 

17:  fte  :=  Vr=l  fei\ 

18:  vds  :=  ViLi  v<^si‘, 

19:  nds  :=  ((T  A  S )  A  -'Guard(P'))  A  -i((T  A  -5)  A  ^Guard(P)); 
20:  P7  :=  P7  V  Group  (P  A  nds); 

21:  P1  :=  P'  V  Group(P  A  (/fe  A  rfo)')-, 

22:  return  P7,  fte-, 


Thread  1  AddRecovery 

Input:  deadlock  states  ds,  invariant  S,  and  transition  predicate  mt. 
Output:  recovery  transition  predicate  rec. 

1:  lyr,  rec  :=  S,  false; 

2:  repeat 

3:  rt  :=  Group(ds  A  (lyr)'); 

4:  rt  :=  rt  A  —iGroup(rt  A  mt); 

5:  rec  :=  rec  V  rt; 

6:  lyr  :=  Guard(ds  A  rt) 

7:  until  ( lyr  =  false); 

8:  return  rec; 


Thread  2  Eliminate _ 

Input:  deadlock  states  ds,  program  P,  invariant  S,  fault  transitions 
F,  fault  span  T,  visited  deadlock  states  vds,  states  predicate 
reachable  by  faults  only  rfo,  predicate  fte  failed  to  eliminate. 
Output:  revised  program  transition  predicate  P,  visited  deadlock 
states  vds,  predicate  fte  failed  to  eliminate. 

1:  wait  (ranter); 

2:  ds  :=  ds  A  -i vds; 

3:  vds  :=  vds  V  ds; 

4:  signal  (mutex); 

5:  if  (ds  =  false)  then 
6:  return  P; 

7:  end  if 

8:  old  :=  P; 

9:  tmp  :=  (T  A  ->S)  A  P  A  (ds)'; 

10:  P  :=  P  A  -i  Group  (tmp); 

11:  fs  :=  Guard(T  A  -<5  A  F  A  (ds)')  A  -'rfo; 

12:  P,  vds,  fte  :=  Eliminate(/s,  P,  S,  F,T,  vds,  rfo,  fte); 

13:  nds  :=  Guard(T  A  ->S  A  Group(tmp)  A  H Guard(P)); 

14:  P  :=  P  V  (Group(tmp)  A  nds); 

15:  nds  :=  nds  A  Guard(tmp); 

16:  fte  :=  fte  V  —'(old  A  ->P  AT  A  (ds)')"; 

17:  P,  vds,  fte  :=  Eliminate(nds  A  ->S,  P,  S,  F,  T,  vds,  rfo,  fte); 

18:  return  P,  vds,  fte; 


(a)  Master  Thread  (b)  Worker  Threads 

Figure  1:  Parallel  algorithm  for  resolving  deadlock  states. 


arc  introduced  to  the  fault-span.  Hence,  any  computation 
that  takes  a  recovery  path  reaches  the  invariant  predicate 
in  a  finite  number  of  steps. 

Once  all  worker  threads  complete  there  job  (Line  8 
in  Figure  l.a),  the  master  thread  adds  all  the  recovery 
transitions  returned  by  worker  threads  to  the  program's 
transition  predicate  (Line  9  in  Algorithm  ResolveDead¬ 
lockStates).  At  this  point,  the  remaining  deadlock  states 
(Line  11)  have  to  be  made  unreachable,  as  it  is  not  possi¬ 
ble  to  add  safe  recovery  from  them  to  the  invariant  pred¬ 
icate. 


Example  (cont’d).  As  mentioned  in  the  introduc¬ 
tion,  one  type  of  deadlock  states  in  BA  is  of  the  form 
(0,0, 0,1),  where  the  Byzantine  general  g  and  non¬ 
general  processes  j  and  k  agree  on  decision  0,  but  pro¬ 
cess  l  has  decided  on  1.  The  algorithm  ResolveDead¬ 
lockStates  resolves  such  deadlock  states  and  their  sym¬ 
metrical  states  by  adding  the  following  recovery  actions 


to  process  l  (and  by  symmetry  to  processes  j  and  k)  of 
BA: 

BA3i  ::  d.j  =  0  A  d.k  =  0  A  d.l  =  1  A  f.l  =  0 
— *  d.l,f.l:=  0,0|1 

BA4i  ::  d.j  =  1  A  d.k  =  1  A  d.l  =  0  A  f.l  =  0 
— *  d.l,  f.l  :=  1, 0|  1 


5.2  Parallel  State  Elimination 

Let  ds  be  a  deadlock  state  predicate  from  where  recov¬ 
ery  to  the  invariant  predicate  cannot  be  added.  Hence, 
in  order  for  P’  (the  synthesized  program)  to  satisfy  the 
third  condition  of  the  synthesis  problem,  we  need  to  en¬ 
sure  that  ds  is  eliminated  from  the  set  of  states  that  P' 
can  reach  in  the  presence  of  faults.  Similar  to  addition  of 
recovery  paths,  the  Algorithm  ResolveDeadlockStates 
launches  one  worker  thread  per  each  partition  of  ds  for 
elimination  (Lines  12-15). 

The  Thread  Eliminate  (cf.  Figure  Lb)  works  as  fol¬ 
lows.  We  first  keep  track  of  visited  deadlock  states  by  all 
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Figure  2:  Inconsistencies  raised  by  concurrency. 


worker  threads  (Lines  1-4)  so  that  no  thread  attempts  to 
eliminate  deadlock  states  that  have  already  been  consid¬ 
ered  for  elimination.  In  particular,  all  threads  synchro¬ 
nize  on  the  predicate  vds  which  contains  visited  dead¬ 
lock  states  by  all  threads  (Lines  1-4).  Next,  we  remove 
all  incoming  transitions  to  ds  (Lines  8-10).  Then,  since 
a  program  does  not  have  control  over  the  occurrence  of 
faults,  we  eliminate  states  that  can  reach  ds  via  a  fault 
transition  (Lines  11-12).  Now,  if  removal  of  transitions 
in  Line  10  causes  some  state  predicate  nds  to  become  a 
deadlock  state  predicate  (Line  13)  then  we  add  the  transi¬ 
tions  (and  the  corresponding  group)  that  begin  from  nds 
(Lines  15-17)  to  P  and  instead,  we  eliminate  nds1 .  We 
keep  repeating  this  procedure  recursively  until  there  does 
not  exist  a  state  to  eliminate. 

Once  all  worker  threads  complete  their  job  (Line  15 
in  Figure  La),  the  master  thread  merges  all  the  results 
by  collecting  transitions  that  all  worker  threads  agree 
on  (Line  16).  Although  the  above  algorithm  is  a  sound 
building  block  for  a  sequential  algorithm,  it  may  create 
inconsistencies  when  multiple  instances  of  it  run  in  par¬ 
allel. 

5.2.1  Handling  Inconsistencies 

Let  si  and  S2  be  two  states  that  arc  considered  for  elim¬ 
ination  and  (sq,si)  and  (so,S2)  be  two  transitions  for 
some  so-  A  sequential  algorithm  that  applies  Eliminate, 
removes  transitions  (sq,  si)  and  (sq,  S2)  which  causes  sq 
to  be  a  new  deadlock  state  (cf.  Figure  2. a).  Hence,  it 
puts  (sq,  si)  and  (sq,  S2)  (and  corresponding  group  pred¬ 
icates)  back  into  the  program  being  synthesized  and  in¬ 
vokes  Eliminate  on  state  so-  However,  when  multiple 
worker  threads,  say  th  \  and  thi,  run  concurrently,  there 
are  three  possible  scenarios  that  cause  inconsistencies, 
described  next. 


*Let  P  be  a  transition  predicate.  (P)"  denotes  the  state  predi¬ 
cate  obtained  by  first  abstracting  unprimed  variables  in  P  and  then 
replacing  all  primed  variables  of  P  by  their  corresponding  unprimed 
variables. 


Case  1.  Consider  the  case  where  deadlock  states  si  and 
S2  are  in  different  partitions.  Hence,  th \  invokes  Elimi¬ 
nate  on  si  which  in  turn  removes  (so,  si),  and,  th‘2  in¬ 
vokes  Eliminate  on  S2  which  removes  (so,  s 2)  (cf.  Fig¬ 
ure  2.b).  Thus,  neither  thread  invokes  Eliminate  on  so, 
since  they  do  not  identify  so  as  a  deadlock  state.  Sub¬ 
sequently,  when  the  master  thread  merges  the  results  re¬ 
turned  by  th\  and  U12  (i.e..  Line  16  in  Figure  La),  so 
becomes  a  new  deadlock  state  which  has  to  be  elimi¬ 
nated  while  the  group  predicates  of  transitions  (so,  si) 
and  (so,  S2)  have  been  removed  unnecessarily.  In  order 
to  resolve  this  case,  we  replace  all  outgoing  transitions 
that  start  from  so  and  mark  so  as  a  state  that  has  to  be 
eliminated  in  subsequent  iterations  (Lines  19-20). 

Case  2.  Due  to  backtracking  behavior  of  Eliminate, 
it  is  possible  that  th\  and  th-2  consider  common  states 
for  elimination.  In  particular,  if  th  \  considers  si  and  U12 
considers  both  si  and  S2  for  elimination  (cf.  Figure  2.b), 
after  merging  the  results,  no  new  deadlock  states  are  in¬ 
troduced.  However,  (so,si)  would  be  removed  unnec¬ 
essarily.  In  order  to  resolve  this  case,  we  collect  all  the 
states  that  worker  threads  failed  to  eliminate  (i.e.,  state 
predicate  fte  in  Line  17  in  Figure  La)  and  replace  all  in¬ 
coming  transitions  into  those  states  (Line  21). 

Case  3.  It  is  also  possible  that  th\  considers  si  and  U12 
considers  neither  sj  nor  *2  (cf.  Figure  2.c).  This  case  oc¬ 
curs  when  tti2  stops  backtracking  at  a  level  higher  than  s  | 
and  .s'2  in  the  reachability  graph  due  to  facing  either  Case 
1  or  Case  2.  Thus,  when  the  master  thread  merges  the 
results  returned  by  the  worker  threads,  no  new  deadlock 
state  is  introduced,  but  (sq,  si)  is  removed  unnecessarily. 
While  identifying  this  case  given  the  structures  in  Fig¬ 
ure  2.c  is  not  straightforward,  one  approach  to  resolve 
this  inconsistency  is  to  force  all  worker  threads  to  syn¬ 
chronize  at  each  backtracking  step.  Since  such  synchro¬ 
nization  seems  to  decline  the  performance  of  the  parallel 
algorithm,  we  choose  not  to  handle  this  case.  Notice  that 
removal  of  (sq,  si)  does  not  result  in  synthesizing  an  in¬ 
correct  program.  However,  the  program  synthesized  us- 


ing  the  parallel  algorithm  may  have  less  transitions  than 
the  program  synthesized  by  the  sequential  algorithm.  We 
note  that  this  case  is  not  due  to  our  algorithm  strategy,  but 
an  artifact  of  breadth-first-search  nature  of  BDD-based 
reachability  analysis.  In  fact,  any  random  state  space 
search  strategy  may  as  well  exhibit  this  case. 


Example  (cont’d).  As  mentioned  in  the  introduc¬ 
tion,  another  type  of  deadlock  states  in  BA  is  of  the 
form  (0, 0, 0, 1),  where  non-general  processes  j  and  k 
agree  with  the  Byzantine  general  on  decision  0,  but  pro¬ 
cess  l  has  finalized  its  decision  on  1.  Since  process  l 
has  finalized  its  decision,  we  cannot  resolve  such  dead¬ 
lock  states  by  adding  safe  recovery.  Thus,  the  algo¬ 
rithm  ResolveDeadlockStates  has  to  eliminate  states 
in  (0,0,  0,1).  More  specifically,  the  Thread  Eliminate 
backtracks  through  the  reachability  graph  until  it  re¬ 
moves  the  transition  (1,  T,  _L,  1)  — >  (1,_L,_L,1).  This 
removal  creates  no  new  deadlock  state  and,  hence.  Elim¬ 
inate  terminates  successfully.  Precisely,  our  algorithm 
revises  action  BA2i,  so  that  no  computation  of  BA  in  the 
presence  of  faults  reaches  a  deadlock  state  as  follows: 

BA2i  ::  ( d.l  ^  _L)  A  (f.l  =  false )  A  ( d.j  /  IV  d.k  ^  _L) 
— >  f.l  :=  true 

We  note  that  in  the  context  of  of  BA,  inconsistency  of  type 
Case  3  does  not  occur.  However,  Cases  1  and  2  do  occur,  but 
our  algorithm  fixes  them.  In  fact,  the  output  of  our  synthe¬ 
sis  algorithm  is  identical  to  the  solution  proposed  by  Lamport, 
Shostak,  and  Pease  [3]. 


6  Experimental  Results  and  Analysis 

In  this  section,  we  present  experimental  results  of  the 
implementation  of  the  Algorithm  ResolveDeadlock¬ 
States.  Throughout  this  section,  all  parallel  experiments 
are  run  on  a  Sun  Fire  V40z  with  2  dual-core  Opteron 
processors  and  16GB  RAM.  The  BDD  representation  of 
the  Boolean  formulae  has  been  done  using  the  C++  in¬ 
terface  to  the  CUDD  package  developed  at  University  of 
Colorado  [7]2.  We  note  that  our  algorithm  is  determin¬ 
istic  and  the  testbed  is  dedicated.  Hence,  the  only  non- 
deterministic  factor  in  time  for  synthesis  is  synchroniza¬ 
tion  among  threads.  Based  on  our  experience  with  the 
synthesis,  this  factor  has  a  negligible  impact  and,  hence, 
multiple  runs  on  the  same  data  essentially  reproduce  the 
same  results. 

Table  1  illustrates  the  detailed  outcome  of  our  experi¬ 
ments  with  respect  to  two  programs,  namely,  Byzantine 
agreement  (denoted  BA1)  and  Byzantine  agreement  with 
fail-stop  faults  (denoted  BAFS1),  where  i  is  the  num¬ 
ber  of  non-general  processes.  In  BAFS,  in  addition  to 

2Note  that  the  results  for  the  sequential  algorithm  in  this  paper  are 
different  from  the  ones  appeared  in  [2]  due  to  unrelated  optimizations 
that  are  present  in  both  the  sequential  in  parallel  algorithms. 


Byzantine  faults  introduced  in  Section  3,  the  program 
is  subject  to  fail-stop  faults  which  stop  normal  opera¬ 
tion  of  a  process.  Clearly,  as  compared  to  BA,  BAFS 
has  a  larger  size  of  reachable  states  and  a  more  com¬ 
plex  structure.  The  table  shows  total  synthesis  time, 
state  elimination  time  including  the  time  spent  in  worker 
threads  Eliminate  and  handling  inconsistencies,  addition 
of  recovery  time,  and  memory  usage  for  synthesizing 
the  fault-tolerant  version  of  the  given  program.  Recall 
that  in  addition  to  deadlock  resolution,  the  total  synthe¬ 
sis  time  includes  other  tasks  such  as  generation  of  fault- 
span,  removing  unsafe  actions,  and  reconstructing  invari¬ 
ant  which  are  not  in  the  scope  of  this  paper  and,  therefore, 
are  omitted  in  Table  1 . 

6.1  Parallelism  Timing  Analysis 

Before  we  analyze  the  results,  we  note  that  for  less  than 
10  non-general  processes,  our  parallel  algorithm  does  not 
outperform  the  sequential  (not  threaded)  algorithm  due  to 
negligible  state  elimination  time  and  high  level  of  context 
switching.  However,  for  10  or  more  non-general  pro¬ 
cesses,  as  can  be  seen  in  Table  1,  all  results  show  sig¬ 
nificant  speedups  when  our  parallel  algorithm  runs  on 
two  or  four  cores  as  compared  to  the  sequential  algo¬ 
rithm.  In  fact,  as  the  size  of  reachable  states  (i.e.,  the 
fault-span)  grows,  the  parallel  algorithm  exhibits  a  bet¬ 
ter  performance  in  both  state  elimination  and  addition  of 
recovery.  For  instance,  in  case  of  BAFS25,  deadlock  res¬ 
olution  takes  more  than  one  day  using  the  sequential  al¬ 
gorithm,  whereas  the  same  task  can  be  accomplished  in 
slightly  more  than  1.5  hours  using  the  parallel  algorithm 
running  on  four  cores.  This  speedup  is  observed  in  vir¬ 
tually  all  the  experiments.  However,  the  table  shows  that 
4-core  runs  do  not  show  significant  improvement  over  2- 
core  runs.  We  explain  the  reason  later  in  this  section. 

One  can  observe  that  the  performance  improvement  of 
our  parallel  algorithm  is  superlinear.  Obviously,  such 
a  dramatic  improvement  cannot  be  solely  attributed  to 
parallelization.  Our  experiments  show  that  this  speedup 
is  due  to  both  parallelization  and  partitioning  deadlock 
states  which  significantly  reduces  the  size  of  BDDs  in¬ 
volved  during  deadlock  resolution.  To  understand  the 
reason  for  the  superlinear  speedup  from  Table  1,  we 
conduct  three  sets  of  experiments.  First,  after  creating 
the  threads,  we  force  the  threads  to  run  sequentially  by 
adding  synchronization  between  them  (cf.  Table  2  for 
results).  While  this  setup  explains  a  paid  of  the  super- 
linear  speedup,  we  find  that  the  completion  time  for  the 
case  where  threads  run  on  two  cores  is  less  than  half  of 
that  for  the  case  where  threads  run  sequentially.  To  un¬ 
derstand  this,  we  identify  the  size  of  the  BDDs  explored 
in  the  partitioned  sequential  run  and  in  the  parallel  run 
(cf.  Table  3).  Furthermore,  we  perform  a  subset  of  ex¬ 
periments  from  Table  1  on  a  single  processor  machine 
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36 
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26 

1.1 
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0.2 

41 
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0.5 

0.2 

0.1 

69 
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57 

55 

1.2 

29 

5.1 

3 

0.4 

1.4 

46 

4.4 

2.4 

0.7 

0.9 

75 

BA25 

to20 

317 

312 

4 

29 

14.5 

9.1 

0.9 

3.6 

46 

13.4 

8 

1.5 

3 

75 

BA 27 
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53S 

530 

5.5 

32 

21.4 

13.5 

1 

5.7 

46 

20.4 

12.3 

2.1 

4.5 

73 
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to23 

700 

687 

10 

33 

26.8 

17.9 

1.2 

6.3 

46 

30.9 

16 

3.2 

7.3 

80 

BAFS10 
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2.9 

2.7 

0.1 

28 

0.8 

0.4 

0.1 

0.1 

25 

0.8 

0.3 

0.1 

0.1 

82 
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82.8 

80.9 

1.4 

31 

5.8 

3 

0.6 

1.5 

47 

5.6 

2.6 

0.7 

1.3 

73 
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34 

30 
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24.9 

13.9 

2.3 

5.2 

85 
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>  24h 

* 

* 

* 
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69.5 

5.5 

26.2 

58 

96 
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5.4 

24 

97 
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* 

* 

* 
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94.8 

6.3 

35.3 
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84.1 

8.03 

36.2 

99 

BAFS 28 

to25 

>  24h 

* 

* 

* 

170.63 

113.05 

7.48 

36.98 

60 

170 

102 

8 

40.7 

100 

Table  1:  Experimental  results  for  algorithm  ResolveDeadlockStates.  RS:  Size  of  reachable  states.  Tt:  Total  synthesis  time 
in  minutes.  El:  Total  time  spent  in  state  elimination  in  minutes.  Ex:  Total  time  (m)  spent  by  Eliminate  worker  threads.  Ic: 
Time  spent  (m)  for  resolving  inconsistencies.  Rc:  Time  spent  (m)  for  addition  of  recovery  paths.  Mm:  Memory  usage  in  KB. 
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0.35 
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57 
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0.38 
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0.9 

0.1 

0.1 

2.2 
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0.1 
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82.8 

80.9 

1.4 

16.6 

13 
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2.1 

26.4 

22.6 

0.8 

1.9 

Table  2:  Effect  of  partitioning  without  parallelizing. 


where  no  (additional)  synchronization  is  added  between 
the  threads  but  they  are  prevented  from  running  simul¬ 
taneously  because  the  underlying  machine  has  only  one 
core  (cf.  Table  4).  These  results  conclusively  demon¬ 
strate  that  the  reduction  in  the  size  of  BDDs  caused  by 
partitioning  the  deadlock  states  is  responsible  for  the  su¬ 
perline  ar  speedup. 

In  order  to  study  the  experimental  results  in  detail 
consider  Table  2,  where  we  partition  the  set  of  dead¬ 
lock  states  and  then  run  Eliminate  for  each  partition  in 
a  sequential  manner  so  that  the  output  (transition  pred¬ 
icate)  of  state  elimination  for  the  first  partition  is  in¬ 
put  to  the  second  invocation  of  Eliminate  for  the  sec¬ 
ond  partition.  For  instance,  in  case  of  BAFS 15 ,  we  gain 
§§  ~  5  times  speedup  by  only  splitting  deadlock  states 
in  two  partitions.  However,  Table  1  shows  that  the  over¬ 
all  speedup  for  BAFS 15  is  ~  14.3  which  means 
we  gain  ~  2.9  by  parallelizing  on  two  cores.  No¬ 
tice  that  other  experiments  have  the  same  pattern.  There 


are  two  reasons  for  this  extra  speedup:  (1)  smaller  size 
of  BDDs  in  the  parallel  algorithm  as  compared  to  par¬ 
titioned  sequential  algorithm,  (2)  distribution  of  BDDs 
across  multiple  threads.  These  issues  are  discussed  next. 

The  effect  parallelization  on  the  size  of  BDDs.  Ta¬ 
ble  3  shows  the  number  of  nodes  in  the  BDD  that  rep¬ 
resents  visited  deadlock  states  (i.e.,  the  variable  vds  in 
Thread  Eliminate  in  Figure  l.b)  for  parallel  and  sequen¬ 
tial  invocations  of  Eliminate.  As  can  be  seen,  the  size 
of  nodes  in  the  parallel  runs  are  smaller  and,  hence,  their 
manipulation  is  faster.  This  is  due  to  the  fact  that  when 
two  threads  arc  running  in  parallel  and  synchronize  on 
vds,  they  do  not  explore  the  reachability  graph  as  deep 
as  when  they  arc  running  one  after  another.  In  other 
words,  when  two  Eliminates  run  concurrently  they  do 
not  invade  each  other’s  territory.  Moreover,  one  can  ob¬ 
serve  that  this  behavior  is  more  dramatic  as  programs  get 
larger.  As  a  direct  result,  our  algorithm  benefits  from  the 
synchronization  on  uds.  We  have  observed  this  pattern 
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Sequential  2 -partition 

Parallel  2-core 

Eliminatel 

Eliminate2 

Eliminatel 

Eliminate2 

BA 15 

4938 

4943 

4603 

4774 

BA20 

8943 

8379 

6578 

6464 

Table  3:  Number  of  nodes  in  BDDs  that  represent  visited 
deadlock  states. 


Seq. 

Par.  2-partition  1-core 

Par.  4-partition  1-core 

BA15 

6.2 

1.4 

2.1 

BA20 

51.8 

6 

9.4 

Table  4:  Total  synthesis  time  when  parallel  algorithm  runs  on 
a  single-core  machine.  (Note  that  since  this  set  of  experiments 
required  a  single  core  machine,  they  are  performed  in  a  differ¬ 
ent  setup  than  previous  experiments.  Hence,  the  time  cannot 
be  directly  compared  with  time  from  other  tables.) 

in  other  experiments  as  well. 

The  effect  of  distribution  of  BDDs  across  multiple 
threads.  As  another  approach  to  analyze  the  super- 
linear  speedup,  we  repeated  a  subset  of  experiments  pre¬ 
sented  in  Table  1  on  a  single  processor/core  machine 
with  2.2GHz  processor  and  1G  memory.  Thus,  in  this 
setup,  similar  to  the  experiments  from  Table  1 ,  deadlock 
states  are  partitioned  into  multiple  threads.  Although  no 
explicit  (additional)  synchronization  is  added  between 
these  threads  (as  done  in  experiments  in  Table  2),  they 
cannot  execute  simultaneously  since  there  is  only  one 
processor/core.  The  results  from  these  experiments  arc 
available  in  Table  4.  As  we  can  see  from  this  table, 
for  BA15  (respectively,  BA20),  a  speedup  of  4.3  (respec¬ 
tively,  8.6)  is  obtained  with  two  threads  running  on  a  sin¬ 
gle  core.  By  comparison,  in  this  example,  the  speedup 
was  6.3  (respectively,  1 1.2)  when  these  threads  were  per¬ 
mitted  to  execute  on  a  multicore  machine.  Thus,  results 
from  Tables  2-4  conclusively  demonstrate  that  the  super- 
linear  speedup  in  Table  1  is  caused  by  the  fact  that  the 
size  of  the  BDDs  is  reduced  due  to  partitioning  of  dead¬ 
lock  states  across  different  threads. 

Table  2  also  reveals  why  4-core  runs  do  not  outperform 
2-core  runs  significantly.  This  is  due  to  creation  of  sig¬ 
nificantly  more  inconsistencies  in  a  4-partition  structure 
than  a  2-partition  structure.  In  fact,  parallelization  using 
4-core  shows  a  better  improvement  than  2-core.  Thus, 
our  parallel  algorithm  is  considerably  efficient.  Table  1 
also  shows  that  we  benefited  from  parallelism  since  the 
time  spent  to  resolve  inconsistencies  was  significantly 
less  than  the  time  spent  for  running  worker  Eliminate 
threads.  However,  more  research  needs  to  be  done  on  ef¬ 
fective  partitioning  which  is  an  issue  in  distributed  model 


checking  as  well.  As  an  example  of  unbalanced  parti¬ 
tioning,  we  note  that  if  one  partitions  deadlock  states  of 
Byzantine  agreement  based  on  b.g  and  d.j,  no  speedup 
is  gained,  since  the  value  of  b.g  in  all  deadlock  states  in 
fault-span  is  1. 

We  have  also  observed  that  in  cases  where  there  exist  a 
large  number  of  processes  in  a  distributed  program,  com¬ 
puting  group  predicates  becomes  a  bottleneck,  which  in 
turn  may  make  the  execution  of  worker  threads  into  the 
corresponding  sequential  algorithm.  In  fact,  this  is  the 
very  reason  that  parallel  addition  of  recovery  does  not 
show  a  significant  performance  improvement. 

6.2  Memory  Usage 

Although  incorporating  multiple  instances  of  a  BDD 
package  increases  the  memory  usage,  we  argue  that  since 
the  required  amount  of  memory  is  not  a  bottleneck,  the 
trade  off  between  speedup  and  memory  usage  is  remark¬ 
ably  beneficial.  In  fact,  the  crucial  factor  in  our  exper¬ 
iments  (and  perhaps  in  general  in  program  synthesis)  is 
time  and  not  space.  Moreover,  Table  1  shows  that  instan¬ 
tiating  two  BDD  packages  does  not  double  the  amount  of 
required  memory. 

7  Related  Work 

Automated  program  synthesis  and  revision  has  been 
studied  from  various  perspectives.  Inspired  by  the  sem¬ 
inal  work  by  Emerson  and  Clarke  [8],  Arora,  Attie, 
and  Emerson  [9]  propose  an  algorithm  for  synthesizing 
fault-tolerant  programs  from  CTL  specifications.  Their 
method,  however,  does  not  address  the  issue  of  addi¬ 
tion  of  fault-tolerance  to  existing  programs.  Kulkarni 
and  Arora  [10]  introduce  enumerative  synthesis  algo¬ 
rithms  for  automated  addition  of  fault-tolerance  to  cen¬ 
tralized  and  distributed  programs.  In  particular,  they 
show  that  the  problem  of  adding  fault-tolerance  to  dis¬ 
tributed  programs  is  NP-complete.  In  order  to  remedy 
the  NP-hardness  of  synthesis  of  fault-tolerant  distributed 
programs  and  overcome  the  state  explosion  problem,  we 
proposed  a  set  of  symbolic  heuristics  [2]  which  allowed 
us  to  synthesize  programs  with  state  space  of  size  1030 
and  beyond. 

Ebnenasir  [11]  presents  a  divide-and-conquer  method 
for  synthesizing  failsafe  fault-tolerant  distributed  pro¬ 
grams.  A  failsafe  program  is  one  that  does  not  need  to 
satisfy  its  liveness  specification  in  the  presence  of  faults. 
Thus,  a  respective  synthesis  algorithm  does  not  need  to 
resolve  deadlock  states  outside  the  invariant  predicate. 
Moreover,  Ebnenasir’s  synthesis  method  resolves  dead¬ 
lock  states  inside  the  invariant  predicate  in  a  sequential 
manner. 

Parallelization  of  symbolic  reachability  analysis  has 
been  studied  in  the  model  checking  community  from 
different  perspectives.  In  [4,  12,  13],  the  authors  pro- 
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pose  solutions  and  analyze  different  approaches  of  paral¬ 
lelization  of  saturation- based  generation  of  state  space  in 
model  checking.  In  particular,  in  [13],  the  authors  show 
that  in  order  to  gain  speedups  in  saturation-based  par¬ 
allel  symbolic  verification,  one  has  to  pay  a  penalty  for 
memory  usage  up  to  10  times,  as  compared  to  the  se¬ 
quential  algorithm.  Other  efforts  range  from  simple  ap¬ 
proaches  that  essentially  implement  BDDs  as  two-tiered 
hash  tables  [14,  15],  to  sophisticated  approaches  rely¬ 
ing  on  slicing  BDDs  [16]  and  techniques  for  worksteal¬ 
ing  [17].  However,  the  resulting  implementations  show 
only  limited  speedups. 


8  Conclusion  and  Future  Work 

In  this  paper,  we  focused  on  one  of  the  main  com¬ 
plexity  barriers,  resolution  of  deadlock  states,  in  auto¬ 
mated  addition  of  fault-tolerance  to  distributed  programs. 
Our  approach  was  based  on  parallelization  with  multiple 
threads.  We  considered  parallelization  in  two  scenarios: 
(1)  adding  recovery  transitions,  and  (2)  eliminating  dead¬ 
lock  states.  With  the  parallelization  of  these  scenarios, 
we  gain  a  significant  speedup.  As  expected,  most  of  the 
speedup  was  due  to  reduction  in  time  to  eliminate  dead¬ 
lock  states.  We  also  demonstrated  that  we  gained  super- 
linear  speedup  due  to  partitioning  deadlock  states  that 
reduces  the  size  of  corresponding  BDDs. 

While  parallelization  reduces  the  time  spent  in  elimi¬ 
nating  deadlock  states,  it  may  also  lead  to  some  incon¬ 
sistencies  that  have  to  be  resolved.  The  time  for  re¬ 
solving  such  inconsistencies  is  one  of  the  bottlenecks  in 
parallelization,  as  this  inconsistency  is  resolved  sequen¬ 
tially.  We  note  that  the  synchronization  on  visited  states 
was  also  added,  in  paid,  to  reduce  inconsistencies  among 
threads  by  requiring  them  to  coordinate  with  each  other. 

Our  approach  provides  each  thread  with  its  own  copy 
of  shared  variables.  Although  this  has  a  potential  to  in¬ 
crease  the  memory  usage,  our  experiments  show  that  the 
actual  memory  usage  is  low.  In  general,  synthesis  prob¬ 
lems  tend  to  have  a  higher  time  complexity  than  the  cor¬ 
responding  verification  problems.  Hence,  we  expect  that 
a  symbolic  synthesis  algorithm  will  run  out  of  time  be¬ 
fore  it  runs  out  of  memory.  Hence,  the  increased  space 
complexity  is  unlikely  to  be  the  bottleneck  during  syn¬ 
thesis. 

One  future  work  in  this  context  is  to  identify  tradeoff 
in  additional  synchronization  among  t  hie  ads.  While  this 
may  reduce  concurrency  among  threads,  it  may  also  re¬ 
duce  the  time  for  resolving  inconsistencies.  Another  fu¬ 
ture  work  is  parallelization  of  the  other  complexity  bar- 
rier ,  fault-span  generation. 
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